Search results for "information extraction"
showing 10 items of 25 documents
Translingual text mining for identification of language pair phenomena
2016
Translingual Text Mining (TTM) is an innovative technology of natural language processing for building multilingual parallel corpora, processing machine translation, contextual knowledge acquisition, information extraction, query profiling, language modeling, contextual word sensing, creating feature test sets and for variety of other purposes. The Keynote Lecture will discuss opportunities and challenges of this computational technology. In particular, the focus will be made on identification of language pair phenomena and their applications to building holistic language model which is a novel tool for processing machine translation, supporting professional translations, evaluation of tran…
The HisClima database: historical weather logs for automatic transcription and information extraction
2021
Knowing the weather and atmospheric conditions from the past can help weather researchers to generate models like the ones used to predict how weather conditions are likely to change as global temperatures continue to rise. Many historical weather records are available from the past registered on a systemic basis. Historical weather logs were registered in ships, when they were on the high seas, recording daily weather conditions such as: wind speed, temperature, coordinates, etc. These historical documents represent an important source of knowledge with valuable information to extract climatic information of several centuries ago. This paper presents a database for researching about the ca…
Human-in-the-Loop Conversation Agent for Customer Service
2020
This paper describes a prototype system for partial automation of customer service operations of a mobile telecommunications operator with a human-in-the loop conversational agent. The agent consists of an intent detection system for identifying the types of customer requests that it can handle appropriately, a slot filling information extraction system that integrates with the customer service database for a rule-based treatment of the common scenarios, and a template-based language generation system that builds response candidates that can be approved or amended by customer service operators. The main focus of this paper is on the system architecture and machine learning system structure …
BIOfid dataset: publishing a German gold standard for named entity recognition in historical biodiversity literature
2019
The Specialized Information Service Biodiversity Research (BIOfid) has been launched to mobilize valuable biological data from printed literature hidden in German libraries for over the past 250 years. In this project, we annotate German texts converted by OCR from historical scientific literature on the biodiversity of plants, birds, moths and butterflies. Our work enables the automatic extraction of biological information previously buried in the mass of papers and volumes. For this purpose, we generated training data for the tasks of Named Entity Recognition (NER) and Taxa Recognition (TR) in biological documents. We use this data to train a number of leading machine learning tools and c…
Diversity in random subspacing ensembles
2004
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with diffe…
Extraction of Medical Terms for Word Sense Disambiguation within Multilingual Framework
2016
All the languages belonging to the same language family have a certain number of the common characteristics called language pair phenomena, which can be found quite useful for processing them for multilingual purposes like translation across the cognate languages, building dictionaries, thesauri, transcript collections, or for multilingual text retrieval of digital documents. In addition, it is estimated that more than 30% of English vocabulary has been inherited from Latin, which has dominated medical terminology in particular. We use this fact by exploring word sense disambiguation (WSD) in multilingual environment. Specifically in the medical domain, language pair phenomena can be limite…
FrameNet CNL: A Knowledge Representation and Information Extraction Language
2014
The paper presents a FrameNet-based information extraction and knowledge representation framework, called FrameNet-CNL. The framework is used on natural language documents and represents the extracted knowledge in a tailor-made Frame-ontology from which unambiguous FrameNet-CNL paraphrase text can be generated automatically in multiple languages. This approach brings together the fields of information extraction and CNL, because a source text can be considered belonging to FrameNet-CNL, if information extraction parser produces the correct knowledge representation as a result. We describe a state-of-the-art information extraction parser used by a national news agency and speculate that Fram…
Embedded controlled language to facilitate information extraction from eGov policies
2015
The goal of this paper is to propose a system that can extract formal semantic knowledge representation from natural language eGov policies. We present an architecture that allows for extracting Controlled Natural Language (CNL) statements from heterogeneous natural language texts with the ability to support multilinguality. The approach is based on the concept of embedded CNLs.
Cueing animations: Dynamic signaling aids information extraction and comprehension
2013
The effectiveness of animations containing two novel forms of animation cueing that target relations between event units rather than individual entities was compared with that of animations containing conventional entity-based cueing or no cues. These relational event unit cues (progressive path and local coordinated cues) were specifically designed to support key learning processes posited by the Animation Processing Model (Lowe & Boucheix, 2008). Four groups of undergraduates (N ¼ 84) studied a usercontrollable animation of a piano mechanism and then were assessed for mental model quality (via a written comprehension test) and knowledge of the mechanism’s dynamics (via a novel non-verbal …
Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language
2016
Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Language that allows for extracting formal semantic knowledge from an unstructured text corpus. Moreover, the presented approach has a potential to support multilingual input and output.